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The Price of Dirty Data 


BY PAUL A. STRASSMANN 


TOOL: Why Messy Records Hurt the Wallet 


Where do you begin if you want to take advantage of the potential benefits from transforming your systems through consoli- 
dation, conversion to a service-oriented architecture, and imposition of enterprisewide rules that would ensure interoperability and 
security of applications, externally as well as internally? As a first step, you need to adopt standard definitions for metadata—the 
data about data—to guide the processing of all data inputs, regardless of whether they come from legacy or already transformed 
applications. The worksheet below is designed to show the expenses associated with having incomplete, inconsistent and inaccurate 
records in multiple databases—that cannot be easily integrated—at a bank. This example also assumes dramatically reducing the 
number of databases and applications in use—because that’s the key way to reduce errors. 

INSTRUCTIONS: Start with the number of data sources in your organization; review samples to estimate completeness and accuracy 
of data. Follow directions described at left, and fill in your own numbers under “Your Company.” To get an interactive version of this 
worksheet, See: Go.BASELINEMAG.COM/JULO6. 


Number of applications 

Data sources, e.g., terminals, magnetic card recorders 

or counters attached to sensors 

Median number of data elements (names, ID numbers, serial 
numbers of devices, etc.) entered per transaction 


Median number of daily transactions per source 
Data elements entered into system per year (A x B X C X 365 ) 53,260,800,000 


EXAMPLE YOUR COMPANY 


Completeness of data entry, based on samples taken from suspended transactions 


or error registers. Example: a phone number that's missing an area code. 
Accuracy of data, based on samples compared to templates from a corporate 
data dictionary. Example: A misspelled name or address. 

Duplication of data. Example: John Smith makes two reservations, 

so his name shows up twice in a reservation database. 

Conformity of database entries to defined business rules 

for security, authentication, etc. 


i) Data requiring corrections per year (D x ((1-E)+(1-F)) 2,663,040,000 
*)) Incorrect additions to database per year ((D x G) + D x (1- H)) 7,989,120,000 
i) Data audit and remediation. Based on cost for automated intervention per error.* $0.10 
=) Administrative cost per defect $0.08 


M_ Annual cost to manage defective data (( 1 x K) + (J x L )) $905,433,600 


*The cost of fixing defects goes up as they become rarer, as quality increases. 
For example, fixing a defect at 99.9999% quality is harder than fixing a defect at 99.5% 
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